Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PEP 558 - WIP] bpo-30744: Trace hooks no longer reset closure state #3640

Closed

Conversation

ncoghlan
Copy link
Contributor

@ncoghlan ncoghlan commented Sep 18, 2017

(Declined as per the PEP deferral notice in https://github.com/python/peps/pull/3050/files - this reference implementation was useful at the time, but after the frame management changes in Python 3.11 any implementation of the updated semantics in PEP 558 or PEP 667 would be best off starting from scratch. The test cases from this branch might still be useful, but those can always be copied out to a new branch easily enough)


Previously, trace hooks running on a nested function
could incorrectly reset the state of a closure cell.

This avoids that by changing the semantics of the
f_locals namespace on function frames to use a
write-through proxy, such that changes are made
immediately rather than being written back some
arbitrary time later.

PEP 558 and bpo-17960 cover additional changes
to locals() and frame.f_locals semantics that are
part of this update.

Remaining TODO items before this PR could be considered complete:

  • Resolve WIP bpo-44800: Rename _PyInterpreterFrame to _Py_framedata #27525 and resync this PR with the main branch
  • Update PEP 558 to cover the performance improvements made to the proxy implementation (lazy initial cache refresh, share a single fast refs mapping between all proxy instances for a given frame)
  • Merge in Python 3.11 frame implementation changes
  • Fix fast locals proxy implementation to account for Python 3.11 frame implementation changes
  • Fix fast locals proxy implementation to follow the design changes in PEP 558: Make fast locals proxy independent of the legacy dynamic snapshot peps#1787
  • Stop implicitly updating the local variable snapshot when running Python tracing functions
  • Python API documentation updates for locals(), vars(), and potentially others (check all locals mentions)
  • docstring updates for locals() (and potentially vars())
  • PyEval_* C API documentation updates
  • PyFrame_* C API documentation updates
  • Fix C API header layout to follow modern conventions (bpo-35134: Migrate frameobject.h contents to cpython/frameobject.h #18052)
  • When rearranging headers, fix the clang warning/error about PyFrameObject redefinition
  • Add refcount adjustment info for new C APIs
  • Migrate eval() default locals namespace to PyLocals_Get()
  • Migrate exec() default locals namespace to PyLocals_Get()
  • Report an error if IMPORT_STAR opcode is encountered in a CO_OPTIMIZED frame
  • Add an explicit test for the new PyFrame_LocalsToFast() runtime error
  • Add explicit C API tests for PyLocals_Get() and PyEval_GetLocals() at different scopes
  • Implement and test the other new PyLocals and PyFrame functions
  • Implement and test the following methods on the fast locals proxy:
    • setdefault()
    • popitem()
    • clear()
    • update() (theoretically already implemented, but needs explicit tests)
  • Add explicit tests for the behaviour of proxies that reference a cleared frame

Desirable code structure improvements (current plan for these is to put any duplicated into a sharable location in this PR, but file separate maintainability issues to migrate to using the shared code in the original locations, to avoid the diff size impact on this PR):

  • Deduplicate the mutable mapping code copied from odictobject.c
  • Move the DictProxy code out of descrobject.c (obsolete task, as frame locals proxy is now independent of mapping proxy)

https://bugs.python.org/issue30744

@ncoghlan ncoghlan changed the title [DO NOT MERGE - PEP 558] bpo-30744: Trace hooks no longer reset closure state [PEP 558 - DO NOT MERGE] bpo-30744: Trace hooks no longer reset closure state Sep 18, 2017
Previously, trace hooks running on a nested function
could incorrectly reset the state of a closure cell.

This avoids that by slightly changing the semantics
of the locals namespace in functions, generators,
and coroutines, such that closure references are
represented in the locals namespace by their
actual cell objects while a trace hook implemented
in Python is running.

Depends on PEP 558 and bpo-17960 to cover the
underlying change to frame.f_locals semantics.
@ncoghlan ncoghlan force-pushed the bpo-30744-make-locals-closure-safe branch from 05a351e to 7626a0e Compare November 5, 2017 05:28
Objects/frameobject.c Outdated Show resolved Hide resolved
Copy link

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase needed and build failures needed addressing.

@ncoghlan
Copy link
Contributor Author

@njsmith Aside from the TODO's in the PR itself, this is a pretty complete implementation of the current version of PEP 558 now.

@ncoghlan ncoghlan requested a review from njsmith May 22, 2019 14:37
@ncoghlan
Copy link
Contributor Author

One other known issue: the header file updates still need to be modified to conform with the new conventions.

@scotchka
Copy link
Contributor

I don't know if this is already addressed elsewhere, but I'm getting a strange error. Running the below script with python.exe built with this branch (I removed an unused parameter to make it compile):

def foo():
    import pdb; pdb.set_trace()
    x=1

foo()

gives

$ ./python.exe jump.py
> /Users/henry/pep558/cpython/jump.py(3)foo()
-> x=1
(Pdb) x=123
(Pdb) c
python.exe(69485,0x10961d5c0) malloc: *** error for object 0x1029ee790: pointer being freed was not allocated
python.exe(69485,0x10961d5c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

but this error is absent for certain values of x, say:

$ ./python.exe jump.py
> /Users/henry/pep558/cpython/jump.py(3)foo()
-> x=1
(Pdb) x=1234
(Pdb) c
$

@ncoghlan
Copy link
Contributor Author

Hmm, looks like I may have a refcounting bug, and the small integer cache is able to provoke it into crashing.

@scotchka
Copy link
Contributor

scotchka commented May 26, 2019

I can report that not all small numbers trigger this error. On my machine x = 11, say, is fine. Most perplexing!

@ncoghlan
Copy link
Contributor Author

Running test_frame consistently triggers a crash in test_clear_locals for me, so I'm focusing on that initially.

@ncoghlan
Copy link
Contributor Author

Last push resolves the test_frame segfault, but adds a TODO noting that the locals proxy may still misbehave after a frame has been cleared (or is in the process of being destroyed).

There's also a threading test which checks reference cycles aren't being created through the frames, which is currently failing (quite possibly for a related reason)

@ncoghlan
Copy link
Contributor Author

Ugh, the refcyle issue detected by test_threading is kinda inherent in the current implementation: once frame.f_locals gets created, there's now a cyclic reference between the frame and its own locals proxy.

I'm going to try forcibly breaking the cycle when an optimized frame exits (before the eval loop drops its own reference).

@ncoghlan
Copy link
Contributor Author

ncoghlan commented Aug 21, 2021

I've pushed an update that resolves almost all of my own "PEP 558 TODO" items in the code (and ticked them off in the checklist above as well):

  • the proxy implements the full mutable mapping API (and this support is tested)
  • the proxy now holds off on refreshing the value cache until the first operation that assumes the value cache is up to date (so if you only access individual keys by name, you'll never trigger a full cache refresh, allowing O(1) tracing operation regardless of the number of variables on the frame)
  • the fast refs mapping is now stored on the frame object, so create new frame proxy instances becomes O(1) instead of O(n) (since the fast refs mapping doesn't need to be rebuilt)

I will be adding another todo item to the checklist, which is to explicitly test the handling of cleared frames. Implementing proxy.clear() exposed some inconsistencies between the established FastToLocals() logic and the proxy's behaviour when accessing cell variables via a cleared frame, but no unit tests failed when I changed the behaviour to be consistent with the FastToLocals() precedent.

For the merge conflict, I'm hoping to get #27525 merged before I try to sync this branch with the main branch again. It was my first attempt at doing that which prompted me to file bpo-44800 in the first place, as it was quite unclear how the conflicts should be resolved given the currently checked in variable naming conventions.

@ncoghlan
Copy link
Contributor Author

ncoghlan commented Aug 21, 2021

@ericsnowcurrently:

  • I think what the implementation is doing for "not yet created" cell objects aligns with what you say it should be doing. The fast refs entry for a "not yet created" cell object is like the entry for a local variable: a Python integer giving the location of the cell in the fast locals array. It is treated as an unbound name while it is in that state (i.e. the kind says it should be a cell, but the cell hasn't been created yet). However, each time it is accessed via the proxy for any reason, the proxy checks to see if the cell object has been created by MAKE_CELL since the last time the proxy checked. If it has, it fixes up the fast refs entry to refer directly to that cell object instead of to the fast locals array slot. It's still only MAKE_CELL that will ever create the cell object, though - the proxy just promotes it to the fast refs mapping once it notices it has been created.
  • any extra variables that get set are still stored on the underlying frame rather than each proxy object having its own separate storage location. They're just not stored in the fast locals array - they're stored in the same locals dictionary that they're stored in now (which provides backwards compatibility and interoperability with the PyEval_GetLocals() API). That fast locals dictionary is also used to cache the name->value mapping for the fast locals, both for PyEval_GetLocals() compatibility, and to avoid having to make routine operations on the proxy (like checking how many names are currently bound to values) consistently O(n) (instead the proxy refreshes the cache the first time it needs to use it, and then after that assumes it is still up to date - since proxies are cheap to create now, the easiest way to refresh them is to throw the old proxy away and request a new one, but there's also an explicit cache sync method defined)
  • yes, the proxy implementation still exposes all the surprisingly visible cell references that FastToLocals() does, and it makes attempts to write to them actually work. As you say, doing anything else would almost certainly create a compatibility problem at this point. It does mean that calling clear() on the proxy can have more side effects than calling clear() on the frame, but that's consistent with the way PyFrame_LocalsToFast() has historically worked in tracing functions (deleting a key from f_locals would unbind it when returning from the tracing function, regardless of whether it was a local variable or a cell reference)

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@mentalisttraceur
Copy link

mentalisttraceur commented Jul 4, 2022

I have a concern about raising RuntimeError from PyFrame_LocalsToFast.

Was it possible for PyFrame_LocalsToFast before this change to ever raise RuntimeError for any other reason?

If not, then it would be nice to have that explicitly documented (at least in the PEP, maybe elsewhere if appropriate and not too noisy to include it).

But RuntimeError is raised for internal problems, like if memory allocations fail, right? So it seems like the kind of error type that's liable to fly out of any internal function, with no real guarantees that it can't?

So I think it might be a bit of a challenge for third parties to write code that

  1. work on Python versions before this change,
  2. will work on later Python version after this change, and
  3. won't ever hide any other errors on versions before this change, and
  4. does all that reasonably cleanly.

Because it seems like code released today:

  1. can't start catching just any RuntimeError from PyFrame_LocalsToFast,
  2. doesn't have guarantees about exactly what subclass or error message of RuntimeError to expect based on this PEP, and
  3. can't just check sys.version_info because it seems like we don't have certainty about when this PEP will make it in.

For what it's worth, this isn't insurmountable - in my little toy library that mutates functions' variables through f_locals, I settled on trying to modify a variable in the frame of a test function call, checking if that worked, and only if it didn't, I try to import the locals-to-fast stuff. I think this is probably the best in terms of portability across all versions of all implementations of Python where mutating a frame's locals is possible, and elegantly avoids ever causing the RuntimeError that this PEP adds on any implementation+version of Python that implements this PEP. But it takes all this code, might be fragile in other ways or have other downsides I haven't realized yet, and wasn't trivial to think of.

TL;DR: because change takes PyFrame_LocalsToFast from necessary for certain goals to unusable, the added error should be programmatically easy to distinguish for the code that was using it while it was necessary.

@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Jul 5, 2022
@erlend-aasland erlend-aasland added pending The issue will be closed if no feedback is provided and removed pending The issue will be closed if no feedback is provided labels Jul 24, 2022
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added stale Stale PR or inactive for long period of time. and removed stale Stale PR or inactive for long period of time. labels Aug 24, 2022
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Sep 26, 2022
@ncoghlan
Copy link
Contributor Author

Closing as per the deferral notice in https://github.com/python/peps/pull/3050/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review DO-NOT-MERGE stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.